CEPiNS: Conserved Exon Prediction in Novel Species
نویسندگان
چکیده
Exon structure is relatively well conserved among orthologs in several large clades of species (e.g. Mammalia, Diptera, Lepidoptera) across evolutionary distances of up to 80 million years. Thus, it should be straightforward to predict the exon structures in novel species based upon the known exon structures of species that have had their genomes sequenced and well assembled. Being able to predict the exon boundaries in the genes of novel species is important given the quickly growing numbers of transcriptome sequencing projects. CEPiNS is a new pipeline for mining exon boundaries of predicted gene sets from model species and then using this information to identify the exon boundaries in a novel species through codon based alignment. The pipeline uses the freeware SPIDEY, an exon boundary prediction tool, and BLAST (BLASTN, BLASTP, TBLASTX), both of which are part of NCBI's toolkit. CEPiNS provides an important tool to analyze the transcriptome of novel species.
منابع مشابه
کاربری پروتیینهای جدید در ساخت واکسن استافیلوکوکوس اورئوس
Background: Staphylococcus aureus and Staphylococcus epidermidis are major human pathogens of increasing importance due to the spread of antibiotic resistance. Novel potential targets for therapeutic antibodies are products of staphylococcal genes expressed during human infection. Previously, the secreted and surface-exposed proteins among seroreactive antigens have been discovered. Furthermore...
متن کاملGenomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure
MOTIVATION Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other. RESULTS We present a method for combining ge...
متن کاملIdentification of microRNAs in corpus luteum of pregnancy in buffalo (Bubalus bubalis) by deep sequencing
This study was aimed to identify miRNAs of corpus luteum (CL) in buffaloes during pregnancy. For this study, CL (n=2) were collected from gravid uteri of buffalo and RNA was isolated. Following this, the purity and integrity of RNA was checked and used for deep sequencing using Illumina Hiseq 2500 platform. The reads’ quality was checked prior to in silico analyses viz. identification of conser...
متن کاملIn silico investigation of lactoferrin protein characterizations for the prediction of anti-microbial properties
Lactoferrin (Lf) is an iron-binding multi-functional glycoprotein which has numerous physiological functions such as iron transportation, anti-microbial activity and immune response. In this study, different in silico approaches were exploited to investigate Lf protein properties in a number of mammalian species. Results showed that the iron-binding site, DNA and RNA-binding sites, signal pepti...
متن کاملRecognition of Unknown Conserved Alternatively Spliced Exons
The split structure of most mammalian protein-coding genes allows for the potential to produce multiple different mRNA and protein isoforms from a single gene locus through the process of alternative splicing (AS). We propose a computational approach called UNCOVER based on a pair hidden Markov model to discover conserved coding exonic sequences subject to AS that have so far gone undetected. A...
متن کامل